102 research outputs found

    SIP: Performance Tuning through Source Code Interdependence

    Full text link
    Abstract. The gap between CPU peak performance and achieved ap-plication performance widens as CPU complexity, as well as the gap between CPU cycle time and DRAM access time, increases. While ad-vanced compilers can perform many optimizations to better utilize the cache system, the application programmer is still required to do some of the optimizations needed for efficient execution. Therefore, profiling should be performed on optimized binary code and performance prob-lems reported to the programmer in an intuitive way. Existing perfor-mance tools do not have adequate functionality to address these needs. Here we introduce source interdependence profiling, SIP, as a paradigm to collect and present performance data to the programmer. SIP identi-fies the performance problems that remain after the compiler optimiza-tion and gives intuitive hints at the source-code level as to how they can be avoided. Instead of just collecting information about the events directly caused by each source-code statement, SIP also presents data about events from some interdependent statements of source code. A first SIP prototype tool has been implemented. It supports both C and Fortran programs. We describe how the tool was used to improve the performance of the SPEC CPU2000 183.equake application by 59 percent.

    Mechanisms for cooperative shared memory

    Get PDF
    This paper explores the complexity of implementing directory protocols by examining their mechanisms - primitive operations on directories, caches, and network interfaces. We compare the following protocols: Dir1B, Dir4B, Dir4NB, DirnNB, Dir1SW and an improved version of Dir1SW (Dir1SW+). The comparison shows that the mechanisms and mechanism sequencing of Dir1SW and Dir1SW+ are simpler than those for other protocols. We also compare protocol performance by running eight benchmarks on 32 processor systems. Simulations show that Dir1SW+'s performance is comparable to more complex directory protocols. The significant disparity in hardware complexity and the small difference in performance argue that Dir1SW+ may be a more effective use of resources. The small performance difference is attributable to two factors: the low degree of sharing in the benchmarks and Check-In/Check-Out (CICO) directives

    Fast-Cache: A New Abstraction for Memory System Simulation

    No full text

    Cache profiling and the SPEC benchmarks: A case study

    No full text
    This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder
    • …
    corecore